202 PART 5 Looking for Relationships with Correlation and Regression
You can study correlation and regression for many years and not master all of it.
In this chapter, we cover the kinds of correlation and regression most often
encountered in biological research and explain the differences between them. We
also explain some terminology used throughout Parts 5 and 6.
Correlation: Estimating How Strongly
Two Variables Are Associated
Correlation refers to the extent to which two variables are related. In the following
sections, we describe the Pearson correlation coefficient and discuss ways to analyze
correlation coefficients.
Lining up the Pearson correlation
coefficient
The Pearson correlation coefficient is represented by the symbol r and measures
the extent to which two variables (X and Y) tend to lie along a straight line when
graphed. If the variables have no relationship, r will be 0, and the points will be
scattered across the graph. If the relationship is perfect the points will lie exactly
along a straight line, and r will either be:»
»
1: If the variables have a direct or positive relationship, meaning when one
goes up, the other goes up, or»
» –1: If the variables have an inverse or negative relationship, meaning when one
goes up, the other goes down
Correlation coefficients can be positive (indicating upward-sloping data) or nega-
tive (indicating downward-sloping data). Figure 15-1 shows what several differ-
ent values of r look like.
Note: The Pearson correlation coefficient measures the extent to which the points
lie along a straight line. If your data follow a curved line, the r value may be low or
zero, as shown in Figure 15-2. All three graphs in Figure 15-2 have the same
amount of random scatter in the points, but they have quite different r values.
Pearson r is based on a straight-line relationship and is too small (or even zero) if
the relationship is nonlinear. So, you shouldn’t interpret r
0 as evidence of lack
of association or independence between two variables. It could indicate only the
lack of a straight-line relationship between the two variables.